Multi-topic Text Categorization Based on Ranking Approach
نویسندگان
چکیده
This paper is devoted to the multi-topic (multilabel) text classification problem. We propose two methods for reduction from ranking to the multi-label case. Unlike existing multi-label classification methods based on reduction from ranking problem, where the complex classification (threshold) function is being defined on the input feature space, in our approach we propose the construction of simple (linear) multilabel classification function using the output of the ranking methods (class relevance space) as the input. In our first method we propose to estimate the linear threshold function defined on the class relevance space. In the second method we directly find the linear operator mapping class ranks into the set of values of binary multi-label decision functions. Developed methods are less computationally expensive than existing methods and in the same time our methods demonstrate similar and in some cases significantly better accuracy. That has been demonstrated experimentally on well-known multi-label benchmark dataset Reuters-2000 (multi-topic text articles). Index Terms — Data Mining, machine learning, multi-label classification, ranking methods, text categorization.
منابع مشابه
A Comparison of Text Categorization Methods
In this paper firstly I have compared Single Label Text Categorization with Multi Label Text Categorization in detail then I have compared Document Pivoted Categorization with Category Pivoted Categorization in detail. For this purpose I have given the general definition of Text Categorization with its mathematical notation for the purpose of its frugality and cost effectiveness. Then with the ...
متن کاملRanking With Cluster-Based Non-Segregated Approach to Multi-Document Categorization
To summarization of one or more document aims to create a strong summary while retaining the main characteristics of the original set of documents. To cover a number of topic with each theme represented by a cluster of highly related sentences. Sentence clustering is used, it directly generates clusters integrated with ranking. Ranking distribution for sentence in each and every cluster is diff...
متن کاملA Systematic Review of Banking Business Models with an Approach to Sustainable Development
Modern banks have shifted their function as purely administrative, economic and industrial entities into socio-political institutions that must be sensitive to the surrounding environment. This function has always been neglected. This study was conducted based on primary, secondary, and tertiary data and reviews the full text of 75 studies selected from more than 245 studies. The selected elect...
متن کاملKeyword Extraction for Text Characterization
Keywords are valuable means for characterizing texts. In order to extract keywords we propose an efficient and robust, language-and domain-independent approach which is based on small word parts (quadgrams). The basic algorithm can be improved by reexamining and re-ranking keywords using edit distance (i.e. Levenshtein distance) and an algorithm based on the relativistic addition of velocities ...
متن کاملA Probabilistic Approach to Feature Selection for Multi-class Text Categorization
In this paper, we propose a probabilistic approach to feature selection for multi-class text categorization. Specifically, we regard document class and occurrence of each feature as events, calculate the probability of occurrence of each feature by the theorem on the total probability and utilize the values as a ranking criterion. Experiments on Reuters-2000 collection show that the proposed me...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007